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THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS: 

1. A method of identifying a subset of components of a 
system based on data obtained from the system using at least 
one training sample from the system, the method comprising 
the steps of: 

obtaining a linear combination of components of the 
system and weightings of the linear combination of 
components, the weightings having values based on data 
obtained from the at least one training sample, the at least 
one training sample having a known feature; 

obtaining a model of a probability distribution of the 
known feature, wherein the model is conditional on the 
linear combination of components; 

obtaining a prior distribution for the weighting of 
the linear combination of the components, the prior 
distribution comprising a hyperprior having a high 
probability density close to zero, the hyperprior being such 
that it is not a Jeffreys hyperprior; 

combining the prior distribution and the model to 
generate a posterior distribution; and 

identifying the subset of components based on a set of 
the weightings that maximise the posterior distribution. 

2. The method as claimed in claim 1, wherein the step of 
obtaining the linear combination comprises the step of using 
a Bayesian statistical method to estimate the weightings. 

3. The method as claimed in claim 1 or 2, further 
comprising the step of making an apriori assumption that a 
majority of the components are unlikely to be components 
that will form part of the subset of components. 

4. The method as claimed in any one of the preceding 
claims, wherein the hyperprior comprises one or more 
adjustable parameters that enable the prior distribution 
near zero to be varied. 
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5. The method as claimed in any one of the preceding 
claims, wherein the model comprise a mathematical equation 
in the form of a likelihood function that provides the 
probability distribution based on data obtained from the at 
least one training sample. 

6. The method as claimed in claim 5, wherein the 
likelihood function is based on a previously described model 
for describing some probability distribution. 

7. The method as claimed in any one of the preceding 
claims, wherein the step of obtaining the model comprises 
the step of selecting the model from a group comprising a 
multinomial or binomial logistic regression, generalised 
linear model, Cox's proportional hazards model, accelerated 
failure model and parametric survival model. 

8. The method as claimed in claim 7, wherein the model 
based on the multinomial or binomial logistical regression 
is in the form of: 
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9. The method as claimed in claim 7, wherein the model 
based on the generalised linear model is in the form of: 

L = log p(y | ft 0) = ± { 2^Lim) + cM) } 
«=i «,(<P) 

10. The method as claimed in claim 7, wherein the model 
based on the Cox's proportional hazards model is in the form 
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11. The method as claimed in claim 7, wherein the model 
based on the Parametric Survival model is in the form of: 
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12. The method as claimed in any one of the preceding 
claims, wherein the step of identifying the subset of 
components comprises the step of using an iterative 
procedure such that the probability density of the posterior 
distribution is maximised. 



13. The method as claimed in claim 12, wherein the 
iterative procedure is an EM algorithm. 

14. A method for identifying a subset of components of a 
subject which are capable of classifying the subject into 
one of a plurality of predefined groups, wherein each group 
is defined by a response to a test treatment, the method 
comprising the steps of: 

exposing a plurality of subjects to the test treatment 

and grouping the subjects into response groups based on 

responses to the treatment; 

measuring components of the subjects; and 
identifying a subset of components that is capable of 

classifying the subjects into response groups using a 

statistical analysis method. 
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15. The method as claimed in claim 14, wherein the 
statistical analysis method comprises the method as claimed 
in any one of claims 1 to 13 • 

16. An apparatus for identifying a subset of components of 
a subject, the subset being capable of being used to 
classify the subject into one of a plurality of predefined 
response groups wherein each response group, is formed by 
exposing a plurality of subjects to a test treatment and 
grouping the subjects into response groups based on the 
response to the treatment, the apparatus comprising: 

an input for receiving measured components of the 
subjects; and 

processing means operable to identify a subset of 
components that is capable of being used to classify the 
subjects into response groups using a statistical analysis 
method. 

17. The apparatus as claimed in claim 16, wherein the 
statistical analysis method comprises the method as claimed 
in any one of claims 1 to 15. 

18. A method for identifying a subset of components of a 
subject that is capable of classifying the subject as being 
responsive or non- responsive to treatment with a test 
compound, the method comprising the steps of: 

exposing a plurality of subjects to the test compound 

and grouping the subjects into response groups based on each 

subjects response to the test compound; 

measuring components of the subjects; and 
identifying a subset of components that is capable of 

being used to classify the subjects into response groups 

using a statistical analysis method. 

19. The method as claimed in claim 18, wherein the 
statistical analysis method comprises the method as claimed 
in any one of claims 1 to 13. 
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20. An apparatus for identifying a subset of components of 
a subject, the subset being capable of being used to 
classify the subject into one of a plurality of predefined 
response groups wherein each response group is formed by 
exposing a plurality of subjects to a compound and grouping 
the subjects into response groups based on the response to 
the compound, the apparatus comprising; 

an input operable to receive measured components of 
the subjects; 

processing means operable to identify a subset of 
components that is capable of classifying the subjects into 
response groups using a statistical analysis method, 

21. The apparatus as claimed in claim 20, wherein the 
statistical analysis method comprises the method as claimed 
in any one of claims 1 to 15. 

22. An apparatus for identifying a subset of components of 
a system from data generated from the system from a 
plurality of samples from the system, the subset being 
capable of being used to predict a feature of a test sample, 
the apparatus comprising: 

a processing means operable to: 

obtain a linear combination of components of the system 
and obtain weightings of the linear combination of 
components, each of the weightings having a value based on 
data obtained from at least one training sample, the at 
least one training sample having a known feature; 

obtaining a model of a probability distribution of a 
second feature, wherein the model is conditional on the 
linear combination of components; 

obtaining a prior distribution for the weightings of 
the linear combination of the components, the prior 
distribution comprising an adjustable hyperprior which 
allows the prior probability mass close to zero to be varied 
wherein the hyperprior is not a Jeffrey's hyperprior; 
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combining the prior distribution and the model to 
generate a posterior distribution; and 

identifying the subset of components having component 
weights that maximize the posterior distribution. 

23. The apparatus as claimed in claim 22, wherein the 
processing means comprises a computer arranged to execute 
software. 



24. A computer program which, when executed by a computing 
apparatus, allows the computing apparatus to carry out the 



method as claimed in any one of claims 



1 to 13. 



25. A computer readable medium comprising the computer 
program as claimed in claim 24. 

26. A method of testing a sample from a system to identify 
a feature of the sample, the method comprising the steps of 
testing for a subset of components that are diagnostic of 
the feature, the subset of components having been determined 
by using the method as claimed in any one of claims 1 to 15. 

27. The method as claimed in claim 26, wherein the system 
is a biological system. 

28. An apparatus for testing a sample from a system to 
determine a feature of the sample, the apparatus comprising 
means for testing for components identified in accordance 
with the method as claimed in any one of claims 1 to 15. 

29. A computer program which, when executed by on a 
computing device, allows the computing device to carry out a 
method of identifying components from a system that are 
capable of being used to predict a feature of a test sample 
from the system, and wherein a linear combination of 
components and component weights is generated from data 
generated from a plurality of training samples, each 
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training sample having a known feature , and a posterior 
distribution is generated by combining a prior distribution 
for the component weights comprising an adjustable 
hyperprior which allows the probability mass close to zero 
5 to be varied wherein the hyperprior is not a Jeffrey's 

hyperprior, and a model that is conditional on the linear 
combination, to estimate component weights which maximise 
the posterior distribution. 



10 30. A method of identifying a subset of components of a 

biological system, the subset being capable of predicting a 
feature of a test sample from the biological system, the 
method comprising the steps of: 

obtaining a linear combination of components of the 
15 system and weightings of the linear combination of 

components, each of the weightings having a value based on 
data obtained from at least one training sample, the at 
least one training sample having a known feature; 

obtaining a model of a probability distribution of the 
2 0 known feature, wherein the model is conditional on the 
linear combination of components; 

obtaining a prior distribution for the weightings of 
the linear combination of the components, the prior 
distribution comprising an adjustable hyperprior which 
25 allows the probability mass close to zero to be varied; 

combining the prior distribution and the model to 
generate a posterior distribution; and 

identifying the subset of components based on the 
weightings that maximize the posterior distribution. 
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